Early-return sample #366

stephanos · 2024-10-02T17:56:00Z

What was changed

Why?

Checklist

Closes
How was this tested:

Any docs updates needed?

early-return/workflow.go

early-return/starter/main.go

early-return/activity.go

cretz

Would recommend tests when this becomes a non-draft PR

cretz · 2024-10-03T12:18:19Z

early-return/workflow.go

+		UpdateName,
+		func(ctx workflow.Context) error {
+			condition := func() bool { return initDone }
+			if completed, err := workflow.AwaitWithTimeout(ctx, earlyReturnTimeout, condition); err != nil {


I think you should leave how long a caller is willing to wait for the initial update up to them unless it's really important to differentiate start-to-update timeout from schedule-to-update timeout.

Okay; I thought there was a risk that the caller might accidentally wait indefinitely if they don't specify a deadline on their end. But I just turned off the worker, and the ExecuteWorkflow request times out after 10s, even though I'm using context.Background(). I wasn't aware that timeout existed.

I'm learning a lot about writing workflows right now.

cretz · 2024-10-03T12:19:09Z

early-return/workflow.go

+		return err
+	}
+
+	// Phase 1: Initialize the transaction synchronously.


Arguably this logic could be flipped and users may prefer that in many scenarios. Can flip where the update is the init and the primary workflow waits for an init update before continuing. There are tradeoffs to both.

In all my examples until now, I've actually had it flipped. Drew convinced me to do it the other way around, but I'm not quite sure anymore why. What makes you say that the other way might be more preferable?

Since updates are not durable on admitted that is why we did it this way so this is the only safe way to write this type of workflow. if I remember correctly

I don't necessarily think it's more preferable, there are just tradeoffs. The main tradeoff is probably what you want the workflow to do when it's not called via update with start. If you want it to function normally, no problem, if you want it to wait for an update to get it moving, probably want logic flipped.

It isn't more preferable this is the only safe way to write it since update with start is not transactional

My main reasoning is what chad said: you want the workflow to function properly when the client doesn't call it with an update.
Quinn's reasoning makes sense too.

It isn't more preferable this is the only safe way to write it since update with start is not transactional

Do we at least guarantee the update and the start are in the same task? If we don't, all latency bets are off anyways. But whether primary workflow waits on init from update or update waits on init from primary workflow is immaterial I'd think (except if the update can come in a separate task which would be a concern).

We do guarantee it for Update-with-Start.

👍 Then yeah I think it's probably just semantics on which coroutine waits on the other and probably doesn't matter

Another strong reason to do it this way is that, if the update did the init, then the workflow author has to make sure the workflow is correct in the face of multiple calls to the update handler, i.e. normal updates being sent subsequent to the update with start. But with all steps in the main workflow, multiple calls to the update handler are automatically correct.

cretz · 2024-10-03T12:21:20Z

early-return/workflow.go

+		logger.Info("cancelling transaction due to error: %v", initErr)
+
+		// Transaction failed to be initialized or not quickly enough; cancel the transaction.
+		return workflow.ExecuteActivity(activityCtx, CancelTransaction, tx).Get(ctx, nil)


Not usually a common practice to swallow an error as an info-level logger and possibly return success. Usually you would want to mark the workflow failed for various observability reasons.

I see your point. My only worry is that users wouldn't be able to distinguish between "failed to init" and "failed to cancel/complete" - which might require very different actions. At the same time, they would probably have some kind of monitoring themselves?

What I mean here is that if CancelTransaction succeeds, operators have no observability into the failure because you have logged the failure as info and did not fail the workflow. Usually with compensating actions, you want to propagate the original failure, not log-and-swallow. But of course it's up to user preference on whether they want to never fail the workflow on failed transaction, but I think most do want to.

cretz · 2024-10-03T12:22:39Z

early-return/workflow.go

+	var initErr error
+	var initDone bool
+	logger := workflow.GetLogger(ctx)
+
+	if err := workflow.SetUpdateHandler(
+		ctx,
+		UpdateName,
+		func(ctx workflow.Context) error {


Feel free to make a struct with your state and your run as a method and your update handler as a method instead of all in one function. What is here is fine of course, but usually when workflows branch out to handlers and many anonymous functions, it is clearer to use traditional structs with method declarations.

Good idea 👍 Only part I don't quite follow is the "run as a method". Is that possible in the Go SDK? I saw an error when trying to register the workflow method of the struct and couldn't find any example doing that (only for activities).

Only part I don't quite follow is the "run as a method". Is that possible in the Go SDK?

You'd do the wrapping w/ a one-liner, so something like:

type myWorkflow struct { SomeState } func MyWorkflow(ctx workflow.Context, someState SomeState) (*SomeResult, error) { return myWorkflow{ someState }.run(ctx) } func (m *myWorkflow) run(ctx workflow.Context) (*SomeResult, error) { // This kind of setup could go into a newMyWorkflow(...) call instead of in here if err := workflow.SetUpdateHandler(ctx, "myUpdate", m.myUpdate); err != nil { return nil, err } panic("TODO") } func (m *myWorkflow) myUpdate(ctx workflow.Context, someParam SomeParam) (*SomeUpdateResult, error) { panic("TODO") }

👍 I thought I might have missed a trick to do it with a single method

cretz · 2024-10-03T12:23:35Z

early-return/workflow.go

+)
+
+var (
+	activityTimeout    = 2 * time.Second


I understand we want to demonstrate low latency, but this is a pretty aggressive timeout. But it is probably ok.

Yeah, the idea for the sample is to push the low latency story; and this is actually on the upper end of what I'm hearing for customer use cases.

If that's the case, may not want a default retry policy with an initial interval of a second. May actually want max attempts as 1.

I would be as lax as you can on these timeouts. Don't want to unnecessarily fail requests that would have otherwise succeeded. So if the overall workflow task timeout is 10s, maybe give it 9 or 10s?

early-return/workflow.go

cretz · 2024-10-03T12:26:28Z

early-return/workflow.go

+	ID          string
+	FromAccount string
+	ToAccount   string
+	Amount      float64


Usually a sin these days when talking about money to use floats

I had the same thought at first, but I figured it's more intuitive this way in the context of a sample? If this has shifted and I didn't get the memo, I'm happy to change it. I didn't want to add extra complexity/confusion.

I think int would be better, I doubt it adds complexity. We do this in our tutorial too at https://github.com/temporalio/money-transfer-project-template-go/blob/2bb1672af07cb76d449f14beb046e412f44a7afb/shared.go#L12

I see 👍 I'll change it. I thought this would go into the idea of representing cents, too, but the linked example just uses "250" without any denomination.

I'd be ok if you documented that it was in cents or added Cents to the field or something

cretz · 2024-10-03T12:27:53Z

early-return/workflow.go

+
+	// Phase 2: Complete or cancel the transaction asychronously.
+	activityCtx := workflow.WithActivityOptions(ctx, workflow.ActivityOptions{
+		StartToCloseTimeout: 10 * time.Second,


It is confusing that the activityTimeout global is only used once and is for local activity timeout and this isn't even a global and is actually an activity timeout. Arguably there is no need for these single-use globals instead of just inlining, but if there is, consider consistently using globals for these.

Right; I've just made them global to make it easier to see at a glance what the timeouts are without reading line-by-line.

But you've only made some global and the global name is ambiguous because it's not general activity timeout (that's hardcoded right here), it's init transaction timeout.

nit: can you use a different, longer timeout here? since this is the async part, and I think 10s was used elsewhere? I think 30s is a fairly standard timeout for a rando activity.

cretz · 2024-10-03T12:28:20Z

early-return/workflow.go

+	// By using a local activity, an additional server roundtrip is avoided.
+	// See https://docs.temporal.io/activities#local-activity for more details.
+
+	activityOptions := workflow.WithLocalActivityOptions(ctx, workflow.LocalActivityOptions{


I doubt you'll want the default retry options with such an aggressive schedule to close of 2s

drewhoskins-temporal · 2024-10-03T19:36:32Z

early-return/workflow.go

+
+	// Phase 2: Complete or cancel the transaction asychronously.
+	activityCtx := workflow.WithActivityOptions(ctx, workflow.ActivityOptions{
+		StartToCloseTimeout: 10 * time.Second,


nit: can you use a different, longer timeout here? since this is the async part, and I think 10s was used elsewhere? I think 30s is a fairly standard timeout for a rando activity.

drewhoskins-temporal · 2024-10-03T19:50:11Z

early-return/workflow.go

+	UpdateName         = "early-return"
+	TaskQueueName      = "early-return-tq"
+	activityTimeout    = 2 * time.Second
+	earlyReturnTimeout = 5 * time.Second


This is aggressive. Our usual advice is to have more generous timeouts but to monitor latencies and keep them low.
Think about sitting at a computer waiting for an operation to complete. If it was going to take more than 5 seconds, would you want it to fail saying "maybe that worked" ? Or would you rather wait longer? You'd probably be willing to wait a little longer.
Many client-facing RPC servers time out after around 30 seconds, and so these sorts of timeouts can be calibrated to be shorter than that.
Here's a draft of a comment:

One common heuristic: Calibrate this number relative to your overall remaining client timeout. So, if your client will timeout in 29 more seconds, you might choose 28s to give time to return and report the correct error.
In general, err on the generous side so as not to fail operations that would have succeeded.

Curious if @cretz 's advice would be similar.

Completely situational I think. No strong opinion. Arguably the caller should determine how long they're willing to wait. A timer inside a workflow does not account for, say, the workflow being slow to start.

I've removed the "Await" timeout (earlyReturnTimeout) (see other convo).

And I've bumped the local activity timeout to 5s now; and the async activity to 30s.

stephanos · 2024-10-04T01:27:00Z

early-return/workflow.go

+
+func (tx *Transaction) ReturnInitResult(ctx workflow.Context) error {
+	if err := workflow.Await(ctx, func() bool { return tx.initDone }); err != nil {
+		return fmt.Errorf("transaction init cancelled: %w", err)


AFAICT, this is the only untested line of the workflow. I'm not quite sure how to use the Go SDK testing env to trigger this case.

You would need to test cancellation using https://pkg.go.dev/go.temporal.io/[email protected]/internal#TestWorkflowEnvironment.CancelWorkflow no? Not saying you need to for this sample though.

Ah, that's the missing piece! Yeah, I agree, it's prob fine without.

Quinn-With-Two-Ns · 2024-10-04T16:08:40Z

We should link this sample from the readme as we do for all samples https://github.com/temporalio/samples-go/blob/main/README.md

Quinn-With-Two-Ns · 2024-10-04T16:10:05Z

early-return/workflow_test.go

+	env.RegisterActivity(tx.CompleteTransaction)
+
+	uc := &updateCallback{}
+	env.RegisterDelayedCallback(func() {


nit: would add a comment explaining this will guarantee the update is sent in the first WFT.

FWIW I'm confused even after Quinn's comment. Why is this test seemingly not using the normal interface for UwS?
I assume you have a good reason related to the Go test library and that there's nothing fixable in this PR. But I'm wondering if there's something we can improve as part of UwS public preview or GA.

Is it not backed by the Java test service?

Correct; Go SDK has it's own time-skipping test server, and it has its own APIs. We haven't added an UwS API since this approach here works, too. But I agree, it's not immediately obvious (at least wasn't for me, had to ask Quinn).

stephanos force-pushed the earlyreturn branch 2 times, most recently from 63b3064 to 6a7a54d Compare October 2, 2024 18:11

Early-return sample

7c726e9

stephanos force-pushed the earlyreturn branch from 6a7a54d to 7c726e9 Compare October 2, 2024 19:27

drewhoskins-temporal reviewed Oct 2, 2024

View reviewed changes

early-return/workflow.go Outdated Show resolved Hide resolved

rename timeout

509449f

Quinn-With-Two-Ns reviewed Oct 2, 2024

View reviewed changes

early-return/workflow.go Outdated Show resolved Hide resolved

Quinn-With-Two-Ns reviewed Oct 2, 2024

View reviewed changes

early-return/starter/main.go Outdated Show resolved Hide resolved

Quinn-With-Two-Ns reviewed Oct 2, 2024

View reviewed changes

early-return/activity.go Outdated Show resolved Hide resolved

stephanos added 3 commits October 2, 2024 20:13

use struct

3565530

tweak log

9b4a47a

sleep in activities

733a0b6

cretz reviewed Oct 3, 2024

View reviewed changes

make const

ecd78c8

drewhoskins-temporal reviewed Oct 3, 2024

View reviewed changes

address comments

2097b86

stephanos commented Oct 4, 2024

View reviewed changes

remove unused field

a0d812c

stephanos force-pushed the earlyreturn branch from 4dbc622 to 4420300 Compare October 4, 2024 01:48

stephanos marked this pull request as ready for review October 4, 2024 15:43

tweak ScheduleToCloseTimeout

d400b38

stephanos force-pushed the earlyreturn branch from 4420300 to d400b38 Compare October 4, 2024 15:59

Quinn-With-Two-Ns reviewed Oct 4, 2024

View reviewed changes

Quinn-With-Two-Ns approved these changes Oct 4, 2024

View reviewed changes

stephanos added 3 commits October 4, 2024 09:46

add README entry

c509f54

add RegisterDelayedCallback note

3c68ea4

run as method

eac75d8

stephanos enabled auto-merge (squash) October 4, 2024 17:27

Merge branch 'main' into earlyreturn

fb87344

stephanos merged commit 428f8e4 into temporalio:main Oct 4, 2024
3 checks passed

Early-return sample #366

Early-return sample #366

Conversation

stephanos commented Oct 2, 2024

What was changed

Why?

Checklist

cretz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephanos Oct 4, 2024 • edited Loading

Choose a reason for hiding this comment

cretz Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

stephanos Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cretz Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cretz Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

stephanos Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

cretz Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephanos Oct 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephanos Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cretz Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephanos Oct 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephanos Oct 4, 2024 • edited Loading

Choose a reason for hiding this comment

Quinn-With-Two-Ns commented Oct 4, 2024

Choose a reason for hiding this comment

drewhoskins-temporal Oct 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephanos Oct 4, 2024 • edited Loading

Choose a reason for hiding this comment

stephanos Oct 4, 2024 •

edited

Loading

cretz Oct 3, 2024 •

edited

Loading

stephanos Oct 3, 2024 •

edited

Loading

cretz Oct 3, 2024 •

edited

Loading

cretz Oct 3, 2024 •

edited

Loading

stephanos Oct 3, 2024 •

edited

Loading

cretz Oct 3, 2024 •

edited

Loading

stephanos Oct 4, 2024 •

edited

Loading

stephanos Oct 3, 2024 •

edited

Loading

cretz Oct 3, 2024 •

edited

Loading

stephanos Oct 4, 2024 •

edited

Loading

stephanos Oct 4, 2024 •

edited

Loading

drewhoskins-temporal Oct 4, 2024 •

edited

Loading

stephanos Oct 4, 2024 •

edited

Loading